Disk buffers #109

camdencheek · 2020-09-03T18:05:38Z

Description of Changes

Adds a disk buffer, a new memory buffer, and a flusher.

Things that could use specific review effort (obviously in addition to anything you'd like to add):

Naming. Names for config options are hard, and I'd like your opinions on what I've got.
Trying to break it. I'd love if this was run on a few different systems other than mine with different workloads.
Thoughts on ergonomics. I split the existing buffer-flusher into a buffer and a flusher. This makes it more modular, and allows us to swap out our flusher for special cases (cabin might need to eventually), but it also makes building and starting them slightly more complex.
Ideas for test cases that I missed
Soundness concerns

Trying to at least somewhat limit the scope, there are still a couple of features left undone:

Configuring backoff. Right now, it has sane defaults, but I'd like to make a separate task of deciding what backoff params we want to expose.
Default disk buffer paths. Right now, it requires specifying a path manually, but ideally, we'd have a "log agent data dir" that it can default to. This feels like a separate feature that requires separate design, especially when it comes to how it interacts with the universal agent.
Configurable behavior "on full"

Please check that the PR fulfills these requirements

Tests for the changes have been added (for bug fixes / features)
Docs have been added / updated (for bug fixes / features)
Add a changelog entry (for non-trivial bug fixes / features)
CI passes

djaglowski · 2020-09-03T18:14:27Z

Log Files	Logs / Second	CPU Avg (%)	CPU Avg Δ (%)	Memory Avg (MB)	Memory Avg Δ (MB)
1	1000	2.2069383	-1.7069187	129.64372	+94.56816
1	5000	7.4310756	-1.9829268	134.68602	+89.655304
1	10000	15.172765	-0.8103008	144.37419	+88.362335
1	50000	79.868065	+7.7787704	226.3711	+9.071793
1	100000	144.62514	+1.068634	425.12527	+118.792694
10	100	2.6034954	-2.9828842	129.91621	+96.15652
10	500	8.569094	-3.3449135	138.60951	+96.021286
10	1000	16.48351	-2.6203403	145.4534	+90.07503
10	5000	82.80417	+5.930458	223.56506	+45.546616
10	10000	156.13608	+0.023147583	384.66406	+52.43521

codecov · 2020-09-03T18:14:30Z

Codecov Report

Merging #109 into master will decrease coverage by 0.30%.
The diff coverage is 70.77%.

@@            Coverage Diff             @@
##           master     #109      +/-   ##
==========================================
- Coverage   72.30%   72.00%   -0.30%     
==========================================
  Files          75       78       +3     
  Lines        4538     5050     +512     
==========================================
+ Hits         3281     3636     +355     
- Misses        973     1056      +83     
- Partials      284      358      +74

Impacted Files	Coverage Δ
operator/builtin/output/elastic.go	`16.04% <0.00%> (-1.31%)`	⬇️
operator/duration.go	`70.27% <0.00%> (-6.20%)`	⬇️
operator/helper/input.go	`74.36% <0.00%> (-4.59%)`	⬇️
operator/helper/parser.go	`83.05% <0.00%> (-3.16%)`	⬇️
operator/buffer/disk_metadata.go	`42.86% <42.86%> (ø)`
...perator/builtin/output/googlecloud/google_cloud.go	`48.63% <46.67%> (+2.20%)`	⬆️
commands/offsets.go	`64.71% <50.00%> (-1.96%)`	⬇️
entry/record_field.go	`90.24% <66.67%> (ø)`
entry/field.go	`81.82% <71.43%> (+0.21%)`	⬆️
operator/buffer/disk.go	`76.47% <76.47%> (ø)`
... and 13 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 4062c97...e003d36. Read the comment docs.

djaglowski

I have only reviewed the interface & docs so far, but will circle back to the rest later today. Here's feedback on what I've reviewed so far though.

docs/operators/elastic_output.md

docs/operators/google_cloud_output.md

docs/types/buffer.md

djaglowski · 2020-09-03T18:53:58Z

docs/types/buffer.md

+
+| Field      | Default             | Description                                                                                                                              |
+| ---        | ---                 | ---                                                                                                                                      |
+| `max_size` | `4294967296` (4GiB) | The maximum size of the disk buffer file in bytes                                                                                        |


Although this is documented well here, it might be slightly clearer to call this max_bytes, since many users will only encounter this in a config file and would either have to make assumptions about the units, or dig up the docs.

I initially thought the same thing, but I would love to implement something like this as a future feature. Would it still make sense to do max_bytes if the value is 8GB?

That's a good point. I guess it comes down to how likely we are to implement that feature and how soon it would happen. If we're going to end up living with the current implementation for a while, then I think we should consider using the more explicit term now and deprecating it later. We could pretty easily support both and just require at most one of the two be specified.

docs/types/flusher.md

djaglowski · 2020-09-03T20:37:40Z

Log Files	Logs / Second	CPU Avg (%)	CPU Avg Δ (%)	Memory Avg (MB)	Memory Avg Δ (MB)
1	1000	2.0517936	-1.8620634	128.79364	+93.71808
1	5000	7.5174274	-1.896575	136.77924	+91.74852
1	10000	15.207217	-0.7758484	144.08014	+88.06828
1	50000	78.463455	+6.374161	228.28516	+10.985855
1	100000	147.36679	+3.8102875	420.82773	+114.49515
10	100	2.6034832	-2.9828963	129.60587	+95.84617
10	500	8.344985	-3.5690222	136.75606	+94.16783
10	1000	16.759014	-2.3448353	145.37177	+89.99339
10	5000	82.11762	+5.2439117	206.09819	+28.079742
10	10000	153.5577	-2.5552368	369.03366	+36.80481

djaglowski · 2020-09-03T20:49:06Z

operator/buffer/disk.go

+	defer d.Unlock()
+	defer func() { d.lastCompaction = time.Now() }()
+
+	// So how does this work? The goal here is to remove all flushed entries from disk,


djaglowski

This looks great to me. Nice job on the detailed design. Awesome test coverage.

camdencheek added 2 commits September 3, 2020 14:02

Add disk buffers

5b6b775

Tidy dependencies

59b2bb6

camdencheek requested review from djaglowski and jmwilliams89 September 3, 2020 18:05

Increase timeout

7b7f14c

djaglowski requested changes Sep 3, 2020

View reviewed changes

Update docs with feedback

e003d36

djaglowski reviewed Sep 3, 2020

View reviewed changes

djaglowski approved these changes Sep 3, 2020

View reviewed changes

jmwilliams89 approved these changes Sep 8, 2020

View reviewed changes

camdencheek merged commit d67cf06 into master Sep 9, 2020

camdencheek deleted the disk-buffer branch September 9, 2020 13:15

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Disk buffers #109

Disk buffers #109

camdencheek commented Sep 3, 2020 •

edited

Loading

djaglowski commented Sep 3, 2020

codecov bot commented Sep 3, 2020 •

edited

Loading

djaglowski left a comment

djaglowski Sep 3, 2020

camdencheek Sep 3, 2020

djaglowski Sep 3, 2020

djaglowski commented Sep 3, 2020

djaglowski Sep 3, 2020

djaglowski left a comment

Disk buffers #109

Disk buffers #109

Conversation

camdencheek commented Sep 3, 2020 • edited Loading

Description of Changes

Please check that the PR fulfills these requirements

djaglowski commented Sep 3, 2020

codecov bot commented Sep 3, 2020 • edited Loading

Codecov Report

djaglowski left a comment

Choose a reason for hiding this comment

djaglowski Sep 3, 2020

Choose a reason for hiding this comment

camdencheek Sep 3, 2020

Choose a reason for hiding this comment

djaglowski Sep 3, 2020

Choose a reason for hiding this comment

djaglowski commented Sep 3, 2020

djaglowski Sep 3, 2020

Choose a reason for hiding this comment

djaglowski left a comment

Choose a reason for hiding this comment

camdencheek commented Sep 3, 2020 •

edited

Loading

codecov bot commented Sep 3, 2020 •

edited

Loading